{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inference with Yi Model Using Transformers\n",
    "\n",
    "Welcome to this tutorial on using the Hugging Face Transformers library for inference with the Yi model! In this notebook, we'll guide you step-by-step on how to load and run the Yi-1.5-6B-Chat model using Transformers. Don't worry if you're new to this – we've made the process simple and easy to follow.\n",
    "\n",
    "## Why Transformers?\n",
    "\n",
    "The Hugging Face Transformers library is a popular open-source Python library that offers:\n",
    "- A vast collection of pre-trained models\n",
    "- User-friendly APIs\n",
    "- Strong community support\n",
    "\n",
    "With Transformers, you can easily download, load, and use various models based on the Transformer architecture, including the Yi model we'll be working with today.\n",
    "\n",
    "Let's get started!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Installing the Necessary Libraries\n",
    "\n",
    "First things first, we need to install the Transformers library and other essential dependencies. Run the cell below to get everything set up:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install transformers>=4.36.2\n",
    "!pip install gradio>=4.13.0\n",
    "!pip install torch>=2.0.1,<=2.3.0\n",
    "!pip install accelerate\n",
    "!pip install sentencepiece"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's what each of these libraries does:\n",
    "- `transformers`: For loading and using the Yi model\n",
    "- `gradio`: For creating a simple web interface (if needed)\n",
    "- `torch`: The PyTorch library for deep learning computations\n",
    "- `accelerate`: To speed up model loading and inference\n",
    "- `sentencepiece`: For tokenization processing in the model\n",
    "\n",
    "Once the installation is complete, we're ready to start using the model!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Importing Libraries and Loading the Model\n",
    "\n",
    "Now, let's import the necessary libraries and load the Yi-1.5-6B-Chat model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
    "\n",
    "# Set the model path\n",
    "model_path = '01-ai/Yi-1.5-6B-Chat'\n",
    "\n",
    "# Load the tokenizer\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)\n",
    "\n",
    "# Load the model\n",
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    model_path,\n",
    "    device_map=\"auto\",  # Automatically choose available devices\n",
    "    torch_dtype='auto'  # Automatically select suitable data type\n",
    ").eval()  # Set the model to evaluation mode\n",
    "\n",
    "print(\"Model loaded successfully!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's break down what's happening in this code:\n",
    "- `AutoTokenizer` is used to load the tokenizer that matches our model.\n",
    "- `AutoModelForCausalLM` loads the language model itself.\n",
    "- `device_map=\"auto\"` allows the model to automatically choose the best device (CPU or GPU).\n",
    "- `torch_dtype='auto'` automatically selects the appropriate data type to optimize performance.\n",
    "- `.eval()` sets the model to evaluation mode, which is important for inference.\n",
    "\n",
    "⚠️ Note: Loading the model might take a bit of time, depending on your internet speed and computer performance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Preparing Input and Running Inference\n",
    "\n",
    "Now that our model is loaded, let's try a simple conversation!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare the conversation\n",
    "messages = [\n",
    "    {\"role\": \"user\", \"content\": \"Hello!\"}\n",
    "]\n",
    "\n",
    "# Convert the conversation to a format the model can understand\n",
    "input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')\n",
    "\n",
    "# Generate a response using the model\n",
    "output_ids = model.generate(input_ids.to('cuda'))\n",
    "\n",
    "# Decode the model's output\n",
    "response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)\n",
    "\n",
    "print(\"User: Hello!\")\n",
    "print(f\"Yi: {response}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's explain this code:\n",
    "\n",
    "1. We start by creating a list containing the user's message.\n",
    "2. The `apply_chat_template` method converts our messages into a format the model can understand.\n",
    "3. `model.generate` uses the converted input to generate a response.\n",
    "4. Finally, we use `tokenizer.decode` to convert the model's output back into readable text.\n",
    "\n",
    "Feel free to modify the `messages` list to try different conversations!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Advanced: Creating a Simple Chat Function\n",
    "\n",
    "To make it easier to have multi-turn conversations with the model, let's create a simple function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def chat_with_yi(user_input, history=[]):\n",
    "    # Add the new user input to the conversation history\n",
    "    history.append({\"role\": \"user\", \"content\": user_input})\n",
    "    \n",
    "    # Prepare the input\n",
    "    input_ids = tokenizer.apply_chat_template(conversation=history, tokenize=True, add_generation_prompt=True, return_tensors='pt')\n",
    "    \n",
    "    # Generate a response\n",
    "    output_ids = model.generate(input_ids.to('cuda'), max_new_tokens=100)\n",
    "    \n",
    "    # Decode the response\n",
    "    response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)\n",
    "    \n",
    "    # Add the model's response to the conversation history\n",
    "    history.append({\"role\": \"assistant\", \"content\": response})\n",
    "    \n",
    "    return response, history\n",
    "\n",
    "# Test the chat function\n",
    "history = []\n",
    "user_inputs = [\"Hello!\", \"Can you tell me a joke?\", \"Thank you, goodbye!\"]\n",
    "\n",
    "for user_input in user_inputs:\n",
    "    print(f\"User: {user_input}\")\n",
    "    response, history = chat_with_yi(user_input, history)\n",
    "    print(f\"Yi: {response}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This function allows you to easily have multi-turn conversations while maintaining context. Feel free to add more user inputs to test the model's performance.\n",
    "\n",
    "## Conclusion\n",
    "\n",
    "Congratulations! You've successfully loaded and run the Yi-1.5-6B-Chat model using the Transformers library. Now you can:\n",
    "- Try different prompts\n",
    "- Engage in multi-turn conversations\n",
    "- Explore various capabilities of the model\n",
    "\n",
    "Remember, when using large language models:\n",
    "- The model may produce inaccurate or biased responses\n",
    "- Don't share sensitive or personal information\n",
    "- Always maintain critical thinking towards the model's outputs\n",
    "\n",
    "I hope you found this tutorial helpful! If you have any questions, don't hesitate to check the [official Transformers documentation](https://huggingface.co/docs/transformers/index) or seek help from the community. Have fun on your AI journey!"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}